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THE CLAIMS 
What is claimed is: 



1 1 . An apparatus for use in a computer system comprising: 

2 a memory having stored therein a first packed data and a second packed data; 

3 and 

4 a processor coupled to said memory to receive said first packed data and said 

5 second packed data, said processor performing operations on data elements in said 

6 first packed data and said second packed data to generate a plurality of data elements 

7 in a third packed data in response to receiving an instruction, at least two of said 

8 plurality of data elements in said third packed data storing the result of multiply-add 

9 operations. 



1 2. The apparatus of claim 1 , said first packed data including a first data 

2 element, a second data element, a third data element, and a fourth data element; 

3 said second packed data containing at least a fifth data element, a sixth data 

4 element, a seventh data element, and an eighth data element; and 

5 said third packed data containing at least a ninth data element and a tenth 

6 data element, said ninth data element representing the result of: 

7 (said first data element multiplied by said fifth data element) added to 

8 (said second data element multiplied by said sixth data element) 

9 said tenth data element representing the result of: 
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(said third data element multiplied by said seventh data element) 
added to (said fourth data element multiplied by said eighth data element). 



1 3. 



2 



The apparatus of claim 2, said first data element, said second data element, 
said third data element, said fourth data element, said fifth data element, said sixth 

3 data element, said seventh data element, and said eighth data element each 

4 comprising a first predetermined number of bits; and 

5 said ninth data element and said tenth data element each comprising a second 

6 predetermined number of bits, said second predetermined number of bits being 

7 greater than said first predetermined number of bits. 



14. The apparatus of claim 1 , wherein said multiply-add operation is performed 
2 with saturation. 

1 5. An apparatus for use in a computer system comprising: 

2 a first storage area; and 

3 a circuit coupled to said first storage area, said circuit multiplying a value A 

4 by a value B to generate a first intermediate result, multiplying a value C by a value 
D to generate a second intermediate result, multiplying a value E by a value F to 



5 



6 generate a third intermediate result, multiplying a value G by a value H to generate a 

7 fourth intermediate result, adding said first intermediate result to said second 

8 intermediate result to generate a value I, adding said third intermediate result to said 

9 fourth intermediate result to generate a value J, and storing said value I and said 
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value J in said first storage area as elements of a first packed data in response to an 
enable signal. 



1 6. The apparatus of claim 5, said computer system further comprising; 

2 a second storage area coupled to said circuit, said second storage for storing 

3 said value A, said value B, said value C, and said value D as data elements of a 

4 second packed data; and 

5 a third storage area coupled to said circuit, said third storage for storing said 

6 value E, said value F, said value G, and said value H as data elements of a third 

7 packed data. 



1 7. The apparatus of claim 5, said value I and said value J providing a higher 

2 precision than at least one of said value A, said value B, said value C, said value D, 

3 said value E, said value F, said value G, and said value H. 



18. A computer system comprising: 

2 a processor; and 

3 a storage area coupled to said processor having stored therein, 

4 a multiply-add instruction for operating on a first packed data and a 

5 second packed data, said first packed data containing at least data elements A, B, C, 

6 and D each including a predetermined number of bits, said second packed data 

7 containing at least data elements E, F, G, and H each including said predetermined 

8 number of bits, said processor generating a third packed data containing at least data 
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9 elements I and J in response to receiving said multiply-add instruction, said data 
10 element I equal to (A x E) + (B x F), said data element J equal to (C x G) + (D x H). 

1 9. The computer system of claim 8, said processor further including a first 

2 register, said processor, in addition to generating said third packed data, also storing 

3 said third packed data in said first register in response to receiving said multiply-add 

4 instruction. 

1 10. The computer system of claim 8, said processor further including: 

2 a first register having stored therein said first packed data; and 

3 a second register having stored therein said second packed data. 

1 11. The computer system of claim 8, said storage area further having stored 

2 therein said first packed data and said second packed data. 

1 12. The computer system of claim 8, said data elements I and J providing a 

2 higher precision than at least one of said data elements A, B, C, D, E, F, G, and H. 

1 13. The computer system of claim 8, said data elements I and J including two 

2 times said predetermined number of bits. 

1 14. The computer system of claim 8, said data elements A, B, C, D, E, F, G, H, I 

2 and J are either unsigned or signed data elements. 
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1 15. A processor comprising: 

2 a first storage for storing a first packed data containing at least an A, a B, a 

3 C, and a D data element; 

4 a second storage area for storing a second packed data containing at least an 

5 E, an F, a G, and an H data element; 

6 a multiply-add circuit including: 

7 a first multiplier coupled to said first storage area to receive said A 

8 and coupled to said second storage area to receive said E; 

9 a second multiplier coupled to said first storage area to receive said B 

10 and coupled to said second storage area to receive said F; 

1 1 a third multiplier coupled to said first storage area to receive said C 

12 and coupled to said second storage area to receive said G; 

13 a fourth multiplier coupled to said first storage area to receive said D 

14 and coupled to said second storage area to receive said H; 

15 a first adder coupled to said first multiplier and said second 

16 multiplier; 

17 a second adder coupled to said third multiplier and said fourth 

18 multiplier, and 

19 a third storage area coupled to said first adder and said second adder, said 

20 third storage area having at least a first field and a second field, said first field for 

21 storing the output of said first adder as a first data element of a third packed data, 
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22 said second field for storing the output of said second adder as a second data element 

23 of said third packed data. 



1 16. An apparatus for use in a computer system comprising: 

2 a first storage area having at least a first field and a second field; and 

3 a circuit, coupled to said first storage area, operating in response to a signal, 

4 said circuit comprising: 

5 a multiplication means for multiplying a value A by a value B to 



6 generate a first intermediate result, multiplying a value C by a value D to generate a 

7 second intermediate result, multiplying a value E by a value F to generate a third 

8 intermediate result, and multiplying a value G by a value H to generate a fourth 

9 intermediate result; and 



10 an arithmetic means for adding said first intermediate result and said 

1 1 second intermediate result to generate a value I, and adding said third intermediate 

12 result and said fourth intermediate result to generate a value J; and 

13 a storage means for storing said value I in said first field and said 

14 value J in said second field as a first packed data. 

1 17. An apparatus for use in a computer system comprising: 

2 a memory having stored therein a first packed data and a second packed data 

3 each containing initial data elements, each of said initial data elements in said first 

4 packed data having a corresponding initial data element in said second packed data; 
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5 a circuit, coupled to said first storage area, operating in response to a signal, 

6 said circuit comprising: 

7 a multiplication means for multiplying together said corresponding 

8 initial data elements in said first packed data and said second packed data to generate 

9 corresponding intermediate data elements, said intermediate data elements being 

10 divided into a number of sets; 

1 1 an arithmetic means for generating a plurality of result data elements, 

12 a first of said plurality of result data elements representing the sum of said 

1 3 intermediate result data elements in a first of said number of sets, a second of said 

14 plurality of result data elements representing the sum of said intermediate result data 

15 elements in a second of said number of sets; and 

16 a storage means for storing said result data elements as a third packed 

1 7 data in said memory . 

1 18. The apparatus of claim 17, wherein said memory includes a register for 

2 storing said third packed data. 

1 19. The apparatus of claim 17, wherein said first packed data and said second 

2 packed data each contain at least four initial data elements, and wherein each of said 

3 sets contain at least two intermediate data elements. 

1 20. The apparatus of claim 17, wherein said arithmetic operations are performed 

2 with saturation. 
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1 21. The apparatus of claim 17, wherein said initial data elements, said 

2 intermediate data elements, and said result data elements are each either signed or 

3 unsigned values. 

1 22. The apparatus of claim 17, wherein said intermediate data elements and said 

2 result data elements contain twice as many bits as said initial data elements. 

1 23. An apparatus for use in a computer system comprising: 

2 a memory having stored therein a first packed data and a second packed data, 

3 said first packed data storing a first plurality of sets of data elements, each of said 

4 first plurality of sets of data elements having a corresponding set of data elements in 

5 said second packed data; and 

6 a processor coupled to said memory to receive said first packed data and said 

7 second packed data, said circuit storing in a third storage area a plurality of data 

8 element as a third packed data in response to receiving an instruction, each of said 

9 plurality of data elements storing the dot product of one of said first plurality of sets 

10 of data elements in said first packed data and said corresponding set of data elements 

11 in said second packed data. 

1 24. In a computer system, a method comprising the steps of: 

2 A) receiving an instruction; and 
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3 B) performing the following steps in response to receiving said 

4 instruction, 

5 B 1) multiplying together a first value and a second value to 

6 generate a first intermediate result, 

7 B2) multiplying together a third value and a fourth value to 

8 generate a second intermediate result, 

9 B3) multiplying together a fifth value and a sixth value to generate 

10 a third intermediate result, 

1 1 B4) multiplying together a seventh value and an eighth value to 

12 generate a fourth intermediate result, 

13 B5) adding together said first intermediate result and said second 

14 intermediate result to generate a first data element in a first packed data, 

15 B6) adding together said third intermediate result and said fourth 

16 intermediate result to generate a second data element in said first packed data; 

17 B7) storing said first packed data in a first storage area. 

1 25. In a computer system, a method for manipulating a first packed data and a 

2 second packed data, said first packed data including Ai, A2, A3, and A4 as data 

3 elements, said second packed data including B 1, B2, B3, and B4 as data elements, 

4 said method comprising the steps of: 

5 receiving an instruction; and 

6 performing the following steps in response to receiving said instruction, 
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performing the operation (Ai x B \) + (A2 x B2) to generate a first 



8 



data element in a third packed data; 



9 



performing the operation (A3 x B3) + (A4 x B4) to generate a second 



10 



data element in said third packed data; 



11 



storing said third packed data in a first storage area. 



1 26. In a computer system having stored therein a first packed data and a second 

2 packed data each containing initial data elements, each of said initial data elements 

3 in said first packed data having a corresponding initial data element in said second 

4 packed data, a method for performing multiply add operations, said method 

5 comprising the steps of: 

6 receiving an instruction; and 

7 performing the following steps in response to receiving said instruction, 

8 multiplying together said corresponding initial data elements in said 

9 first packed data and said second packed data to generate corresponding 

10 intermediate data elements, said intermediate data elements being divided into a 

1 1 number of sets; 

12 generating a plurality of result data elements, a first of said plurality 

13 of result data elements representing the sum of said intermediate result data elements 

14 in a first of said number of sets, a second of said plurality of result data elements 

15 representing the sum of said intermediate result data elements in a second of said 

16 number of sets; and 
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17 storing said plurality of result data elements as a third packed data in 

18 a memory. 
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